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O , The mutual information of a single-layer perceptron with A'^ Gaussian 

inputs and P deterministic binary outputs is studied by numerical simu- 
lations. The relevant parameters of the problem are the ratio between the 
^ . number of output and input units, a — P/N, and those describing the 

fT^ ' two-point correlations between inputs. The main motivation of this work 

refers to the comparison between the replica computation of the mutual 
CO ■ information and an analytical solution valid up to a ~ 0{1). The most 

C*^ ' relevant results are: (1) the simulation supports the validity of the ana- 

lytical prediction, and (2) it also verifies a previously proposed conjecture 
that the replica solution interpolates well between large and small values 
of a. 
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1 Introduction 

The extraction of sensory information by the brain from a stream of muhi- 
dimensional data may be understood as a process of optimisation of mutual 
information (MI) Q or of redundancy g]. The MI measures the statistical 
dependence between two random variables [g|. In our case, they correspond 
to an A^-dimensional input signal ^ provided by the sensory receptors, and 
a P-dimensional output v. More precisely, the MI indicates the amount of 
knowledge about $, that can be extracted from v (See e. g. ref.|0]). With 
respective probability densities p^ and pv , this is given by 

/[p.,p,]=<iog ^";y'fU .,,, (1) 

here log(a;) = -j^- The average < ... >v^^ is over the joint probability distribu- 
tion Pv,(,(v, ^). For instance, if v and ^ are independent, we have p^^^ = p^ ■ p^ 
and so / = 0. 

The problem of learning the statistical properties of a set of A^-dimensional 
correlated Gaussian inputs is well understood for a linear channel, even in the 
presence of noise (which is actually necessary to regularise M/)y]-g. A non- 
linear continuous channel has also been studied in the low-noise limit for rather 
general transfer functions 0] . It was shown that maximisation of the All leads 
to a factorial code. Threshold-linear networks |^, ^ (treated with the replica 
technique) have also been considered. On the other hand, the binary channel, 
where the outputs take discrete values (say w^ = ±1), is not well understood, 
although the problem has been studied using replica-symmetric (RS) statistical- 
mechanical techniques [|o|-[|2|. Most interestingly, an analytical solution has 
been found, and the existence of a large order phase transition as the number 
of output neurons increases has been suggested |Q . The relevant parameter to 
describe this transition's occurrence is the ratio between the number of output 
and of input units, a = P/N. The analytical solution holds up to a value of a 
of order one, beyond which it is not longer correct. On the contrary, the RS 
solution does not exhibit any transition and gives good approximations at both 
the small and large a regimes. It has been proved that below some a ^ 1 the 
analytical and the RS solutions are very close. In fact, an expansion in powers 
of a shows that the two solutions are identical up to 0{a-). In spite of the fact 
that from the third order the corresponding expansions differ, the numerical 
agreement up to a ~ 1 is excellent (a relative difference of less than 0.9% up to 
a = 0.1). This is due to intriguing cancellations between higher orders. 

Here we present simulations with the aim of providing numerical evidence 
on the validity of the analytical solution. In addition, since the order of the 
transition is large and the RS solution seems to interpolate well between the 
small a and the asymptotic regimes, we also compare numerical simulations at 
several values of a with the replica theory prediction |1^ . 



2 The Binary Channel 

We consider a single-layer perceptron, or channel, with N continuous input 
neurons whose states ^i define a vector ^ — {^i}fLi representing the signal 
received fi'om the environment. The output layer has P binary neurons the 
values v^ — ±1 of which compose the vector v = {Vfj,}^^^, that represents the 
code. Between the signal and the code there is an encoder, given by a set of 
synaptic couplings J = {Jifj,}- 

The inputs take values drawn from an A^-dimensional Gaussian probability 
distribution, unbiased (< S^j >= 0) and with correlation matrix C^ ==< ^i^j >, 



PdO = e'^'^'^ -^^yV^^Cx, Cx - |Cx|, (2) 

which using a convenient shorthand can be expressed as ^ = ^(0, Cx)- (Here X 
is a correlation parameter between input neurons, to be defined more precisely 
later). The transfer function is deterministic, so that w^ — sign(/i^), where 

N 

and the J^'s denote the synaptic weight vectors linking the signal ^ to each 
output neuron ^. They form a set of independent random vectors {J^j^^i, 
each distributed according to an A'^-dimensional Gaussian probability, with mean 
< Ji^i >= and correlation matrix F^, whose elements are r*f =< Ji^Jj^ >. 

This means that jj, = iV(0,r^). 

We are mainly interested in computing the averaged mutual information per 
input unit in the thermodynamic limit, i.e. i = limjv-»oo jj < I[Pv:P$\ >j- A 
useful result to have in mind is that the MI in cq.(|l|) is the difference 

I[Pv,P(\j = H[py] - H[p^\^], (4) 

where H[p^v\ = — < logpu >v is the entropy of the output, while H[py\^] = 
— << logp^i^ >^,|^>j is the conditional entropy of the output given the input 

(averaged over the input). The code v, given a fixed signal ^, has the condi- 
tional probability py\^ = Pv,^/p^- Since a deterministic channel clearly has zero 
conditional entropy, in that case the MI reduces to the output entropy. 

3 Analytical results for an example 

As an example we consider a Gaussian input distribution with two-point cor- 
relation Cx given by Cx — X'*^-''. The input neurons are then less (more) 
spatially correlated ii X <^ 1 {X ^^ 1). For X = 1, Cx is a singular matrix. 



while for X -^ Cx tends towards the identity matrix. The correlations be- 
tween two synapses converging to the same output, F^, are chosen to be equal 
for all output neurons and normalised to 1, i.e. F^ = F'-' = Sij, V/i. 

For small a, the MI can be expanded as i{a) ~ Uq + aia + a2a^ + 030;'^. One 
trivially obtains that ag = and ai — 1. Both the RS solution (see Q) and 
the analytical solution (see iQ) give the same value of 02, 

where 7 = jz^^. On the other hand, the two techniques disagree in their 
predictions for 03. The RS method yields 

while (following the methods of ||l^) one can easily verify that the analytical 
technique gives: 

The RS solution also gives rise to predictions for strongly correlated inputs. 
For X ~ 1 one obtains: 

t^K{l~Xy/^a'',iy = 2/3. (8) 

Finally, the same solution shows logarithmic behaviour for large a: 

^-^^S^ + ^+^^S^-^'^+J^T^^i^^S^]- (9) 

0.0 I ' , ' , ' , ' , ' 1 




Figure 1: The coefficient 02 (X) obtained from simulations compared with the 
RS prediction. 



4 Method 

The first step of tlie simulation entails choosing a coupling sample J at random. 
A given choice of J will be labelled as the sample s, and the total number of 
samples will be denoted by S. Next, signals ^ are drawn in order to obtain 
several codewords v. A histogram •p%(y) is then constructed from these states, 
representing the probability Pt,(w) and allowing us to estimate the output en- 
tropy: 



1 Y. Vl^ogvl- (10) 



Clearly, this approximation to the true code entropy i?[pu] will improve as the 
number of drawn signals increases. In practice, we evaluated pj('y) using about 
100 different input states ^. The final step is to calculate the average over an 
ensemble of J , to get i = \ 'Ylis=\ ^s- 

5 Results from simulations 

Here we present studies of three relevant regions in parameter space (a, X): (1) 
a. small ; (2) intermediate values of a and strong correlations, X ^ \\ and (3) 
a large. The results are compared with the theoretical predictions, eqs.(||)-(g). 
The self-averaging property of the MI is also analysed. 

In the small a case, we calculated i{a) for values of a ranging from 0.05 
to 0.2: we fixed P = 10 and took a variable number of inputs in the interval 
N — 50, ...,200. We repeated this process for several values of the correlation 
parameter X — 0.1, 0.2, ..., 0.9. Using that {(a. = 0) = when A^ ^ oo, we 
performed a linear regression on i, i(a,X)/a — a\{X) -I- a2(A)a, and obtained 
the coefficients ai, a-i as a function of X. For all values of X, a\ ~ 1, which is in 
agreement with the theory. The function a2(X") is plotted in fig.H, in comparison 
with the theoretical prediction (eq.(||)). The good agreement indicates that the 
thermodynamic limit solution is a good estimation even for A not very large, 
as long as one is in the low-loading expansion. 

In order to evaluate 03, we set ai = 1 (to avoid increasing the error through 
a larger number of parameters to be fitted). We analysed the MI only for the 
value X = 0.5. From eq. |^ the theoretical value 02 = —0.244 can be obtained, 
and from eqs. |7| and g we have ajf" = 0.227 and a^^ = 0.063 respectively. The 
simulation itself was done using several values of P, and for each of them a 
linear regression was performed using [z(a)/Q; — l]/a — a2+ azOi. 

In this case the corrections due to finite size effects are noticeable. To ob- 
serve convergence to the asymptotic regime we considered several values of P, 
namely P = 3, 5, 7, 10, 12 and 15. We averaged for each over, respectively, 
S = 2500, 1000, 200, 100, 50 and 20 samples of J. The number of samples S was 
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Figure 2: The coefficients a2 (a) and as (b) for several P's, for X 
straight horizontal lines represent the theoretical values; af^"^"* 
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the data, while the RSA coefficient lies outside the error bars. 



taken larger as the number of outputs P became smaller, to compensate for the 
lack of statistics. Numerical evaluation of 03 is extremely costly in computa- 
tional time; this is the reason why we restricted ourselves to a single value of X 
and did not consider values of P larger than 15. 

The result is that our simulation is in agreement with the analytical predic- 
tion. As we see in fig.0b, af converges to a^ r^ 0.28 as P increases. Given the 
error bars, this result is compatible with a^"^ but excludes a^^ . For the sake of 
comparison we have included fig. ^a where the same numerical analysis is done 
for a2- 

The results for strongly correlated inputs are shown in fig. ||a. There is 
good agreement between the simulation and the exponential behaviour of eq. (^ : 
V = 0.662 ~ 2/3. 

The results for the large a limit are presented in fig.^p. The logarithmic 
behaviour predicted in eq.(^ is observed. 

To verify that the MI is self-averaging we calculated its mean-square devia- 
tion A(i) over the samples and made a fit to the form -j=. The good agreement 
with this expression can be seen in fig. gp. This shows that the methods of 
statistical mechanics are appropriate to studies of the information in binary 
channels. 



6 Conclusions 

Our main result refers to the comparison between the RS pXf| and analytical 
[I2I solutions. The difference between them can be seen in a small-a expansion. 
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Figure 3: a) The behaviour of the MI for strongly correlated inputs {X = 0.9). 
P = 10 and N = 3, ...,10. The squares are obtained from simulation, while 
the plotted line is: ln(i) = -0.395 + 0.662 ln(a). b) The MI for large a, 
with N = 4:,P = 1, ...,20 and X = 0.1 (a < 5). The dots are obtained from 
simulation, while the plotted curve is z = 0.78 + 0.92 ln(a). c) The self-averaging 
property. We took X — 0.9 and P = iV = 2, ..., 14 (a = 1). The linear regression 
gives \n[A{i)/i] = -2.5 - 0.49 In(iV), where A^{i) = |Ef=i[*s '^f- 



Numerical simulation confirms that the two solutions coincide up to second 
order. At the next order the solutions arc different (see eqs. ^0), and the 
simulation excludes the replica calculation while it is in agreement with the 
analytical one (see fig. ||). 

We have also verified the conjecture that the RS solution is a good inter- 
polation between the small and the large a behaviors |l2|. In particular, the 
simulation shows that for intermediate values of a and strongly correlated in- 
puts the MI behaves as « ~ a^^^ (see fig.@a). Moreover, in fig.pjb we see that 
the expected logarithmic behaviour for the large a case fits the MI very well. 
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